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This paper generalizes recent proposals of density forecasting 
models and it develops theory for this class of models. In density 
forecasting, the density of observations is estimated in regions where 
the density is not observed. Identification of the density in such re¬ 
gions is guaranteed by structural assumptions on the density that 
allows exact extrapolation. In this paper, the structural assumption 
is made that the density is a product of one-dimensional functions. 

The theory is quite general in assuming the shape of the region where 
the density is observed. Such models naturally arise when the time 
point of an observation can be written as the sum of two terms (e.g., 
onset and incubation period of a disease). The developed theory also 
allows for a multiplicative factor of seasonal effects. Seasonal effects 
are present in many actuarial, biostatistical, econometric and statis¬ 
tical studies. Smoothing estimators are proposed that are based on 
backfitting. Full asymptotic theory is derived for them. A practical 
example from the insurance business is given producing a within year 
budget of reported insurance claims. A small sample study supports 
the theoretical results. 

1. Introduction. In-sample density forecasting is in this paper defined as 
forecasting a structured density in regions where the density is not observed. 
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This is possible when the density is structured in such a way that all entering 
components are estimable in-sample. Let us, for example, assume that we 
have one covariate X representing the start of something; it could be onset 
of some infection, underwriting of an insurance contract or the reporting 
of an insurance claim, birth of a new member of a cohort or an employee 
losing his job in the labour market. Let then Y represent the development or 
delay to some event from this starting point. It could be incubation period 
of some disease, development of an insurance claim, age of a cohort member 
or time spend looking for a new job. Then X -|- T is the calendar time of the 
relevant event. This event is observed if and only if it has already happened 
until a calendar time, say to- The forecasting exercise is about predicting 
the density of future events in calendar times after to- 

The most typical example of a structured density is a simple multiplica¬ 
tive form studied by Mammen, Martmez-Miranda and Nielsen (2015). The 
multiplicative density model assumes that X and Y are independent with 
smooth densities / and g. When / and g are estimated by histograms, 
our in-sample forecasting approach could be formulated via a parametric 
model. This version of in-sample density forecasting is omnipresent in aca¬ 
demic studies as well as in business forecasting; see Martmez-Miranda et 
al. (2013) for more details and references in insurance and in statistics of 
cohort models. Extensions of such parametric histogram type of models can 
often be understood as structured density models modelled via histograms. 
A structured density is defined as a known function of lower-dimensional 
unknown underlying functions; see Mammen and Nielsen (2003) for a for¬ 
mal definition of generalised structured models. Under the assumption that 
the model is true, our forecasts do not extrapolate any parameters or time 
series into the future. We therefore call our methodology “in-sample density 
forecasting”: a structured density estimator forecasting the future without 
further assumptions or approximate extrapolations. 

Our model is related to deconvolution, but there are two major differences. 
First, in our model one observes not only X + Y but also the summands 
X and Y. Second, X and Y are only observed if their sum lies in a certain 
set, for example, in an interval (0,to]- This makes X and Y be dependent 
and the estimation problem be an inverse problem. We will see below that 
the first difference leads to rates of convergence that coincide with rates for 
the estimation of one-dimensional functions in the classical nonparametric 
regression and density settings. The reason is that our model consists in a 
well-posed inverse problem. In contrast, deconvolution is an ill-posed inverse 
problem and allows only poorer rates of convergence. 

This paper adds three new contributions to the literature on in-sample 
density forecasting. First of all, we define smoothing estimators based on 
backfitting and we develop a complete asymptotic distribution theory for 
these estimators. Second, we allow for a general class of regions for which 
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the density is observed. The leading example is a triangle. A triangle arises 
in the above examples where the sum of two covariates is bounded by cal¬ 
endar time. The theoretical discussion in Mammen, Martinez-Miranda and 
Nielsen (2015) was restricted to this case. But there exist many other impor¬ 
tant support sets; see, for example, Kuang, Nielsen and Nielsen (2008) for a 
detailed discussion. Third, we generalize the forecasting model by modelling 
a seasonal component. This is done by introducing an additional multiplica¬ 
tive seasonal factor into the model. Then we have three one-dimensional 
density functions that enter the model and that can be estimated in sample. 
Seasonal effects are omnipresent: onset of some disease could be more likely 
in the winter than in the summer; new jobs might be less likely during the 
summer or they may depend on the business cycle; more auto insurance 
claims are reported during the winter, but they might be bigger on average 
in the summer; cold winters or hot summers affect mortality. When a study 
is running over a few years only and one or two of those years are not fully 
observed, data might be too sparse to leave these two years out of the study. 
Leaving them in might however generate bias. The inclusion of seasonality 
in this paper solves this type of problems and allow us in general to do well 
when years are not fully observed. An illustration producing a within-year 
budget of insurance claims is given in the application section. 

Classical actuarial methodology does not include seasonal effects. Budgets 
are normally carried out manually by highly paid actuaries. The automatic 
adjustment of seasonal effects offered by this paper is therefore potentially 
cost saving. Insurance companies currently use the classical chain ladder 
technique when forecasting future claims. Classical chain ladder has recently 
been identified as being the above mentioned multiplicative histogram in- 
sample forecasting approach; see Martinez-Miranda et al. (2013). The sea¬ 
sonal adjustment suggested in this paper is therefore directly implementable 
to working routines and processes used by today’s nonlife insurance compa¬ 
nies. 

Recent updates of classical chain ladder include Kuang, Nielsen and Nielsen 
(2009), Verrall, Nielsen and Jessen (2010), Martinez-Miranda et al. (2011) 
and Martinez-Miranda, Nielsen and Verrall (2012). These papers re¬ 
interpreted classical chain ladder in modern mathematical statistical terms. 
The generalised structured nonparametric model of this paper is a multi¬ 
plicative density with three effects. The third seasonal effect is a function of 
the covariates of the first two effects. Estimation is carried out by project¬ 
ing an unstructured local linear density estimator, Nielsen (1999), down on 
the structure of interest. The seasonal addition to the multiplicative density 
model of Mammen, Martinez-Miranda and Nielsen (2015) is still a gen¬ 
eralised additive structure, a simple special case of generalised structured 
models. Generalised structured models have historically been more studied 
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in regression than in density estimation. Future developments of our in- 
sample density approach will therefore naturally be related to fundamental 
regression models; see Linton and Nielsen (1995), Nielsen and Linton (1998), 
Opsomer and Ruppert (1997), Mammen, Linton and Nielsen (1999), Jiang, 
Fan and Fan (2010), Mammen and Park (2005, 2006), Nielsen and Sper- 
lich (2005), Mammen and Nielsen (2003), Yu, Park and Mammen (2008), 
Lee, Mammen and Park (2010, 2012, 2014), Zhang, Park and Wang (2013), 
among others. 

The paper is structured as follows. Section 2 describes our structured in- 
sample density forecasting model, and show that the model is identifiable 
(estimable) under weak conditions. Section 3 explains a new approach to 
the estimation of the model. Here, it is assumed that the data are observed 
in continuous time and nonparametric smoothing methods are applied. Sec¬ 
tion 4 contains the theoretical properties of our method and Section 5 consid¬ 
ers numerical examples and discusses the performance of the new approach. 
The Appendix contains technical details. 

2. The model. We observe a random sample {(Aj, Yi):l <i <n} from a 
density / supported on a subset X of a rectangle [0,1]^. The density f{x,y) 
of is a multiplicative function of three univariate components, where 

the first two are a function of the coordinate x and y, respectively, and the 
third is a function of the sum of the two coordinates, x + y, and is periodic. 
Specifically, we consider the following multiplicative model: 

( 2 . 1 ) f{x,y) = fi{x)f 2 {y)f 3 {mj{x + y)), {x,y)el, 

where mj{t) = Jmodj(t), modj(t) = t modulo 1/J for some J > 0, that 
is, mj{t) = J (t — I / J) for l/J<t< {I+ 1)/J, j = 0,1,2,... . Here, fj are 
unknown nonnegative functions supported and bounded away from zero on 
their supports. We note that mj{t) always takes values in [0,1) as t varies 
on M'*', and that the third component f 3 [mj[-)) is a periodic function with 
period J~^. 

We will prove the identihability of the functions /i, /2 and /s under the 
constraints that fi{x)dx = f 2 {y)dy = 1. We will do this for two sce¬ 
narios. In the first case, we assume that fi, f 2 and /a are smooth functions. 
Then identification follows by a simple argument. Our second result does 
not make use of smoothness conditions of the component functions. It only 
requires conditions on the shape of the set I. The second result is important 
for an understanding of our estimation procedure that is based on a pro¬ 
jection onto the model (2.1) without using a smoothing procedure for the 
component functions. 

Our first identihability result makes use of the following conditions: 

(Al) The projections of the set I onto the x- and y-axis equal [0,1]. 
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(A2) For every 2 : G [0,1) there exists {x, y) in the interior of Z with mj{x-\- 
y) = z. Furthermore, for every x,y G (0,1) there exist x' and y' with {x,y') 
and {x',y) in the interior of Z. 

(A3) The functions /i, / 2 , fs are bounded away from zero and inhnity on 
their supports. 

(A4) The functions /i and /2 are differentiable on [0,1]. The function 
is twice differentiable on [ 0 , 1 ). 

(A5) There exist sequences xq = 0 < xi < ■ ■ ■ < Xk = 1 and yo = 1 > yi > 

• • • > 2 /fc = 0 with (x, yj) G Z for Xj <x < Xj+i. 

Theorem 1 . Assume that model (2.1) holds with (Al)-(A5). Then the 
functions /i, /2 , fs are identifiable. 

Remark 1. Let T = max{x + y:{x,y) G X}. We note that the functions 
fj are not identifiable in case J < l/T. To see this, we take fi{u) = f 2 {u) = 
cie“,/ 3 (M) = e“ with the constant ci > 0 chosen for /i = /2 to satisfy the 
constraint fj{u) du = 1. Consider also gi{u) = g 2 {u) = C 2 e*^'^“*"^^“, 53 (u) = 
cf/cl with the constants C 2 > 0 chosen for gi = g 2 to satisfy the constraint 
fo 9ji'^) du = 1. In case J < l/T, we have mj{x + y) = J{x + y) for all (x, y) G 
Z. This implies that (/i,/ 2 ,/ 3 ) and {gi,g 2 ,g 3 ) give the same multiplicative 
density. In fact, if J < l/T, then the assumption (A2) is not fulhlled. 

We now come to our second identihability result that does not require 
smoothness conditions for the functions /i, /2 and f^. This makes use of 
the following conditions on the shape of the support set Z. To introduce 
conditions on the support set Z, we let Ii{y) = {x:{x,y) G T}, hix) = 
{y: (x, y) G Z} and Isfiz) = {x G [0,1 ]: {x,(z + 1)/J — x) G Z}. Below, we 
assume that these sets change smoothly as y,x and z, respectively, move. 
Here, AAB denotes the symmetric difference of two sets A and B in M, 
and mes(A) the Lebesgue measure of a set A C M. Recall the definition 
T = max{x + y:{x,y) GT}, and with this dehne T(J) be the largest integer 
that is less than or equal to TJ. 

(A6) For j G {1,2,3} there exist partitions 0 = a;j < • • • < = 1 of [0,1] 

and a function k : [0, 1] — )• with k { x ) — )• 0 for x —>■ 0 such that (i) for all 

ui,U2 G mes[Ij(ui)AIj(u2)] < k{\ui - U2\),l = l,...,Lj]j = 1,2; 

(ii) for all ui,U 2 G (af_^,af), Y(k=o ^^^[hk{ui)Al 3 k{u 2 )] < k{\ui - U 2 \),l = 
1 ,...,T 3 . Furthermore, it holds that mes(/ 2 (x)) > 0, mes(/i(y)) > 0 and 

Ries[l 3 i(z)] > 0 for X,?/ G (0,1) and for z G [0,1). 

Assumption (A 6 ) will be used to prove the continuity of some relevant 
functions that appear in the technical arguments. The continuity of a func¬ 
tion 7 implies that 7 (x) = 0 for all x if it is zero almost all x. The assumption 
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Fig. 1. Shapes of possible support sets. The horizontal axis indicates the onset (x) and 
the vertical the development (y). 


allows a finite number of jumps in Ij{u) for j = 1,2 and I^kiu) as u moves. 
For example, suppose that I = {{x,y) :0 < x < 1,0 < y < l,x + y < 5/4} 
and J = 2. In this case, L{J) = 2, and for fe = 0,1 we have hkiz) = [0, {z + 
k)/2\ for all z € [0,1), so that changes smoothly as 2 varies on [0,1). 
However, for k = 2 we get that Izk{z) = [.2/2,1] for z G [0,1/2] and hkiz) 
is empty for z G (1/2,1), thus it changes drastically at z = 1/2. In fact, 

lim/i^o J2k^o ™ 6 s[l 3 fc(z + h)Al 3 kiz — h)]^0 for z = 1/2. We note that in 
this case assumption (A 6 ) holds if we split [0,1) into two partitions, [0, 1 / 2 ) 
and ( 1 / 2 , 1 ). 

The assumptions (Al), (A 2 ), (A5) and (A 6 ) accommodate a variety of 
sets X that arise in real applications. Figure 1 depicts some realistic exam¬ 
ples of the set X that satisfy the assumptions. In particular, those sets of the 
type in the panels (c) and (e) satisfy (A 2 ) and (A 6 ) if the maximal vertical 
or horizontal thickness of the stripe is larger than the period 1/J of the 
third component function / 3 (mj(-)). In the interpretation of the examples 
in Figure 1, we follow the equivalent discussion from Keiding (1991) and 
Kuang, Nielsen and Nielsen (2008). The triangle in Figure 1(a) is typical 
for insurance or mortality when none of the underwriting years or cohorts 
are fully run-off. The standard actuarial term “fully run-off” means that 
all events from that underwriting year or cohort have been observed. In al- 
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most all practical cases of estimating outstanding liabilities, actuaries stick 
to the triangle format leaving out fully run-off underwriting years. While 
the triangle also appears in mortality studies, it is common here to leave 
the fully run-off cohorts in the study resulting in the support shape given in 
Figure 1(b). The support in Figure 1(c) arises when the data analyst only 
considers observations from the most recent calendar years. While this ap¬ 
proach is omnipresent in practical actuarial science, there is no formal theory 
or mathematical models behind these procedures in the actuarial literature. 
This paper is therefore an important step toward formalising mathematically 
actuarial practise while at the same time improving it. The support given 
in Figure 1(d) and (e) arises when there is a known time transformation 
such that time is running at another pace for different underwriting years 
or cohort years. While this type of time transformations are well known 
in mortality studies are often coined as versions of accelerated failure time 
models. Time transformations are also well known in actuarial science coined 
as operational time. However, the academic literature of actuarial science is 
still struggling to find a formal definition of what operational time is. This 
paper offers one potential solution to this outstanding and important issue. 
The last Figure 1(f) is included to give an impression of the generality of 
support structures one could deal with inside our model approach. Data is 
missing in the beginning and end of the delay period, but the model is still 
valid and in-sample forecasts can be constructed. 

The model (2.1) has taken structured density forecasting into a new ter¬ 
ritory by leaving the simple multiplicative model. If /s above was constant 
(and therefore not in the model) then our model reduces to the simple mul¬ 
tiplicative model analysed in Martmez-Miranda et al. (2013) and Mammen, 
Martinez-Miranda and Nielsen (2015). These two papers point out that the 
simple multiplicative density forecasting model is a continuous version of a 
widely used parametric approach corresponding to a structured histogram 
version of in-sample density forecasting based on the simple multiplicative 
model. The in-sample density forecasting model under investigation in this 
paper generalizes the simple multiplicative approach in an intuitive and sim¬ 
ple way including seasonal effects. 

In the following theorem, we show that, if there are two multiplicative 
representations of the joint density / that agree on almost all points in X, 
then the component functions also agree on almost all points in [0,1]. We will 
use this result later in the asymptotic analysis of our estimation procedure. 

Theorem 2. Assume that model (2.1) holds with (Al)-(A3), (A5), 
(A6). Suppose that ( 91 , 92 , 93 ) is a tuple of functions that are bounded away 
from zero and infinity with 91 (x)dx = 92 ( 9 ) dy = 1. Let fij = log fj — 

log gfj. Assume t/iat//i(a;)-|- 1 x 2 ( 2 /)+ /i 3 (mj(x-|-?/)) = 0 a. e. on I. Then pLj = Q 
a.e. on [0,1]. 
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3. Methodology. We describe the estimation method for the model (2.1). 
We first note that the marginal densities of X,Y and mj{X + Y) may be 
zero even if we assume that the joint density is bounded away from zero. 
For example, the marginal densities of X and Y at the point tt = 1 are zero 
for the support set I given in Figure 1(a). We estimate the multiplicative 
density model on a region where we observe sufficient data. This means 
that we exclude the points (1,0) and (0,1) in the estimation in the case 
of Figure 1(a), and the point (1,0) in the case of Figure 1(b). Formally, 
for a set (S' C I, let Ji and J 2 denote versions of Ii and I 2 , respectively, 
defined by Ji(y) = {x: {x,y) G S} and J 2 {x) = {y: {x,y) G S}, and define 
Jzi{z) = {x ■. {x^{z + 1) / J — x) G <S}. We take an arbitrarily small number 
5 > 0, and find the largest set S such that 

mes(J 2 (a:)) > 6 , mes(Ji(y)) > 6 , 

L{J) 

^mes(J 3 i(mj(x + y))) >5 for all {x,y) G 5, 

1=0 


where mes(A) for a set A denotes its length. Such a set is given by 5 = 
{{x,y) :0 < x <1 — 6,0 < y <1 — 6,x + y <1} in the case of Figure 1(a), and 
S = {{x,y) & 1:0 < x <1 — 6} in the case of Figure 1(b), for example. 

We estimate fj on S. Let Si and S 2 be the projections of S onto x- 
and y-axis, that is, = {x G [0,1]: {x, y) G S for some y G [0,1]}, ^2 = {y G 
[0,1]: {x, y) G S for some x G [0,1]}, and S 3 = {mj{x + y): (x, y) G S}. In the 
case of Figure 1(a), S'! = S '2 = [0,1 — <5], S 3 = [0,1), but in the case of Fig¬ 
ure 1(b), Si = [0,1 — d], S '2 = [0,1], S 3 = [0,1). We put the following con¬ 
straints on fj-. 




f2{y) = 1 - 


This is only for convenience. Now, we define fw,i{x) = jj^^^^f{x,y)dy, 
fw,2{y) = fix, y) dx and ^, 3 ( 2 :) = “ x) dx. 

Then the model (2.1) gives the following integral equations: 


fw,lix) = flix) 

/ f 2 iy)himj{x + y))dy, 

X G Si 




(3.1) fw,2iy) = f2{y) 

/ fiix)f 3 [mj[x + y))dx. 

yGS2 


’My) 




fiix)f 2 i{z + l)/J 


x) dx, 


ZGS3. 
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We note that the marginal functions on the left-hand sides of the above 
equations are bounded away from zero on Sj. Specifically, inf^g^^^ fwjiu) > 
(5inf(2, /(x, y) > 0 so that fj in the equations are well-dehned. 

Suppose that we are given a preliminary estimator of the joint density /. 
Call it /. We estimate fw,j by fw,j that are defined as fwj, respectively, with 
/ being replaced by the preliminary estimator /. Our proposed estimators 
of fj, for j = 1,2,3, are obtained by replacing in the integral equations 
(3.1) by fw,j, respectively, and solving the resulting equations for the multi¬ 
plicative components. Let '& = fg f{x, y) dx dy and 'd be its estimator defined 
by ?? = n~^ S 5]. Putting the constraints 


(3.2) 



L 


f 2 {y)dy = l, 


L 


fi{x)f 2 {y)h{^j{x + y)) dx dy = -d, 


they are given as the solution of the following backfitting equations: 


(3.3) 


/i(x) = 9i 

hiy) = h 


fw,i{x) 


/3(^) = 03-— 


fj2{x} h{y)h{'mj{x + y)) dy 

fw, 2 {y) 

fji(y) fi(x)f 3 (mj(x + y})dx’ 

fw,3(z) 


Ei=o fjsi(z) + ^)/J - x) dx 


where 6 j are chosen so that fj satisfy (3.2). 

The solution of (3.3) is not given explicitly. The estimates are calculated 
by an iterative algorithm with a starting set of function estimates and 
f^^ that satisfy the constraints (3.2). With the initial estimates, we compute 
from the third equation at (3.3). Then we update /j^ consecutively 
for j = 1,2,3 and for fe > 1 by the equations at (3.3) until convergence. 
Specifically, we compute at the A:th cycle (A: > 1) of the iteration 


fP{x) = 0'[ 


[k] 


fw,l{x) 


fj2(x) h 


(3.4) f'i\y) = el 




\y)ft "\rnj{x + y))dy 

fw,2{y) 


fji(y) (^) /F imj(x + y))dx' 
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ff{z)=9f^ 


fw,3iz) 


fj„(z) /r(a;)/r((^ + i)/j - x) dx 




e[k\ 


[k], 


where 6 ^^^ are chosen so that the resulting satisfy (3.2). 

We note that the naive two-dimensional kernel density estimator is not 
consistent near the boundary region, which jeopardizes the properties of the 
solution of the backfitting equation (3.3) at boundaries. For a preliminary 
estimator / of the joint density /, we take the local linear estimation tech¬ 
nique. The local linear estimator / we consider here is similar in spirit to 
the proposal of Cheng (1997). Let a(u, v; x, y) = {I, {u — x)/hi, {v — y)/h 2 )~^ 
and define 


[X, 


y)= [ 

Js 


{u, v\X, y)si{u, V] X, y)~^h-^ ^/i2 


u — X 


hi 


K 


v-y 

ho 


du dv, 


where (/ii,/i 2 ) is the bandwidth vector and K is a symmetric univariate 
probability density function. Also, define 


n 

b(x, y) = n~^ ^ a.{Xi,Yi;x, y)h^^h 2 ^K 
i=l 


Xi-x 

hi 


K 


Yi-y 


Wi, 


where Wi = 1 if (Xi,Yi} £ S and 0 otherwise. The local linear density esti¬ 
mator / we consider in this paper is defined by yo, where f) = (ijo,f/i,f? 2 ) is 
given by 

(3.5) ^(x,y) = A(x,y)~^b(x,y). 


It is alternatively defined as 

f)(a:,y) = argmin lim / lfh^^b 2 (u,v) - a(u,v; x,y}~^r/(x,y)f 

where fbi,b 2 be the standard two-dimensional kernel density estimator de¬ 
fined by 

fbub2 (x, y) = n~^ ^ 
for a bandwidth vector ( 61 , 62 ). 

Before we close this section, we give two remarks. One is that, instead of 
integrating the two-dimensional estimator /, one may estimate fu,,j directly 
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from the data. In particular, one may estimate by the one-dimensional 
kernel density estimators 


fw,iix) =n 

i=l 


Xi-x\ 
hi ) 


Wi, 


fw, 2 (y) = n 

i=l 


Yi-y 




fw, 3 {z) = n 

i=l 


mj{Xi+Yi) - z 
h3 


Wi. 


Our theory that we present in the next section is valid for this alternative 
estimation procedure. The other thing we would like to remark is that one 
may be also interested in an extension of the model (2.1) that arises when 
one observes a covariate Uj G along with {Xi,Yi). A natural extension of 
the model (2.1) in this case is that the conditional density of {X,Y) given 
U = u has the form f{x,y\u) = fi{x,u)f 2 {y,\i)f 3 {mj{x + y),u), {x,y) G X, 
where the constraints (Bl) now applies to /i(-,z) and / 2 (-,z) for each z. The 
method and theory for this extended model are easy to derive from those 
we present here. 


4. Theoretical properties. Let S denote the space of function tuples g = 
( 51 ) 92 ; 53 ) with square integrable univariate functions gj in the space L 2 [ 0 ,l]. 
Define nonlinear functionals J-j for 1 < j < 3 on 5 by 

Xi{g) = l-[ gi{x)dx, 

Jsi 

X2{g) = i-[ g2{y)dy, 

JS2 

X3ig)='d- / gi{x)g 2 {y)g 3 {mj{x + y))dxdy. 

Js 

Also, define nonlinear functionals J-j for 4 < j < 6, now on x S, by 

J'4(0,g)(x)= / [eif{x,y)-giix)g2{y)g3{rnj{x + y))]dy, 

Jj2{x) 


H{0,g){y) 



[02f{x,y) 


gi{x)g 2 {y)g 3 {mj{x -l y))] dx, 



[6»3 /(x,(z + /)/J 


x) 


gi{x)g 2 {{z + l)/J- x)g‘i{z)] dx, 
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where 0 = Then we define a nonlinear operator x 5 i—?■ 

X 5 by I'{e,g){x,y,z) = (-Ti(g),-T 2 (g),-T 3 (g), J'4(0,g)(a;),JL'5(0,g)(y), 

-T6(0,g)(z))'^. 

Now, we define nonlinear functionals J^j for 1 < j < 3 on 5 and J^j for 4 < 
j < 6 on X 5 as J-j in the above, with the joint density / being replaced by 
its estimator / and ?? by "d. Let ^x 5 i—)• x 5 be the nonlinear operator 
dehned by f{e,g){x,y,z) = {Aig\, f' 2 {g),f' 3 ig), ^ 4 .( 0 , g)ix),f 5 ( 6 , g){y), 
Te{ 6 ,g){z))~^. Our estimators f = (/i,/ 2 ,/ 3 ) along with 0 = (01,02,^3) are 
given as the solution of the equation 

(4.1) f{e,i) = o. 

From the definition of the nonlinear operator we also get = 0, 

where 1 = (1,1,1)"'" and f = (/i,/ 2 j/ 3 )''~ for the true component functions 

We consider a theoretical approximation of f. Dehne a nonlinear opera¬ 
tor by G{ 0 ,g) = T {1 -L 0,f o (1 -L g)), where gi o g 2 denotes the entry-wise 
multiplication of the two function vectors gi and g 2 . Then ^(0,0) = 0. 
Let ^'(d,5) denote the derivative of G{0,g) at (0,g) = (0,0) to the direc¬ 
tion (d,(5). We write fuj{x,y,z) = and p,{x,y,z) = 

(/li(x),/i 2 (y),A 3 (^))"^, where 


(4.2) 


= U^i{x) W [f{x,y)-f{x,y)]dy, 

J J2(x) 

= fw, 2 {y)~^ [ [f{x,y) - f{x,y)]dx, 


hiz) = fw,3{z)' 


HJ) 

1=0 




[f{x,{z + l)/J-x) 


— f{x, {z + 1)/J — x)] dx. 

Let Q '~^: X 5 I—?■ X 5 denote the inverse of Q'^ whose existence we will 
prove in the Appendix. We define f = (/i,/ 2 ,/ 3 ) along with 0 = { 61 , 62 , 03 ) 
by 

((f-f)/f)=^'”'(-f^oA)’ 

where gi/g 2 denotes the entrywise division of the function gi by g 2 . 

It can be seen that S = { 61 , 62 , 63 )'^ = ((/i - fi)/fi,{f 2 - f 2 )/f 2 ,{f 3 - 
f 3 )/f 3 )^ along with d = {di,d 2 ,d 3 )'^ = {di - 1,02 - 1,03 - 1)"^ are given as 
the solution of the following system of integral equations: 

6i{x) = di + fii{x) - [ 62{y) dy 

J J2{x) Jw,l{x) 
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(4.4) - f 5 ^{mj{x -L y)) dy, x e Si, 

J J2{x) Jw,l{x) 

S2{y) = d2 + fl2{y) - [ dx 

Jjl{y) Jw,2[y) 

-f h{mj{x + y)) dx, y € S2, 

Jjiiy) Jw,2[y) 


hiz) = ds + fi3{z) 



?—TT" 



L{J) . 

62 {iz + l)/J-x) 

1=0 dJzii^) 


f{x,{z + l)/J-x) 
fw,3iz) 


dx. 


subject to the constraints 


^ C S 3 , 


0 = / fi{x)5i{x)dx, 

JSi 

(4.5) 0 = [ f 2 {y)d 2 {y)dy, 

JS2 

0 = / f{x,y)[5i{x) + 52{y) + 53{mj{x + y))]dxdy. 

Js 

In the following theorem, we show that the approximation of f by f is 
good enough. In the theorem, we assume that f{x,y) — f{x,y) = Op{en) 
uniformly on S for some nonnegative sequence {Sn} that converges to zero 
as n tends to inhnity. For the local linear estimator / dehned by (3.5) with 
hi ~ /i 2 ~ we have = n~^^^^y/\ogn. The theorem tells that the 

approximation errors of fj for fj are of order Op{n~^d^ logn). In Theorem 4 
below, we will show that fj — fj have magnitude of order Op{n~‘^d^i/logn) 
uniformly on Sj. This means that the first-order properties of fj are the 
same as those of fj. 


Theorem 3. Assume that the conditions of Theorem 2 hold, and that 
the joint density f is bounded away from zero and infinity on its support S 
with continuous partial derivatives on the interior of S. If f{x,y) — f{x,y) = 
Op{£n) uniformly for {x,y) G S, then it holds that \9j — 9j\ = Op(e^) and 
sup„65^. \fj{u) - fj{u)\ = Opisl). 

Next, we present the limit distribution of (f — f)/f. In the next theorem, 
we assume that hi ~ c\n~^l'° and h 2 ~ C 2 n~^l'° for some constants ci, C 2 > 0. 
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For such constants, define 


(4.6) 



Also, define fl? for j = 1,2,3 as jlj at (4.2) with the local linear estimator 

/ being replaced by f^. In the Appendix, we will show that the asymptotic 
mean of {fj — fj)/fj equals where (3 = {(3i, (32, l^s) is the solution of 

the backfitting equation (4.4) with fi being replaced by . Let denote 
the centered version of the naive two-dimensional kernel density estimator. 
Specifically, 


(4.7) 


n 

f'^{x,y) = n~^'^[Kh^{Xi-x)Kh^{Yi-y) 
i=l 

-E{Kh,{Xi-x)Kh,{Yi-y))]. 


Here and below, we write Kh{u) = K{u/h)/h. Define for j = 1,2,3 as 
with taking the role of f^. We will also show that the asymptotic 
variances of {fj — fj)/fj equal those of respectively, and that they are 
given by where 


aj{x) = cf^U^i{x)~^ J K‘^{u)du, 

(^Uy) = cf^fw,2iy)~^ j K‘^{u)du, 

al{z) = cf^fw,3{z)~^ J[K* K{u)][K * K{ciu/c2)] du 
= cf^fw,3{z)~^ J[K* K{u)][K * K{c2u/ci)] du, 


where K * K denotes the two-fold convolution of the kernel K. 

In the discussion of assumption (A6) in Section 2, we note that (A6) allows 
a finite number of jumps in Ij{u) for j = 1,2 and l 3 i{u) as u changes. These 
jump points are actually those where the marginal densities f^j are discon¬ 
tinuous. At these discontinuity points, the expression of the asymptotic dis¬ 
tributions of the estimators is complicate. For this reason, we consider only 
those points in the partitions (a'(,_^, a;(,), 1 < k < Lj, for the asymptotic dis¬ 
tribution of fj, where a^. are the points that appear in assumption (A6). We 
denote by Sj^c the resulting subset of Sj after deleting all a;(,, 1 < /c < Lj — 1. 
Note that fwj is continuous on Sj^c due to (A6). In the theorem below, we 
also denote by Sj the interiors of S'j, j = 1,2,3. 
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For the limit distribution of fj, we put an additional condition on the 
support set. To state the condition, let J 2 (tti;/i 2 ) be a subset of J 2 {ui) such 
that V e J 2 {ui',h 2 ) if and only if u — /i 2 t S J 2 (rti) for all t G [—1,1], The set 
J 2 {ui‘,h 2 ) is inside J 2 {ui) at a depth / 12 . In the following assumption, a], 
and K are the points and the function that appear in assumption (A6). 

(A7) There exist constants C > 0 and a > 1/2 such that the following 
statements hold: (i) for any sequence of positive numbers £„, C 

72 (^ 2 ) for all ui,U 2 G n Si with |tii —U 2 \<en, 1 < A; < Li; (tii; 

Ce") C Ji(rt 2 ) for all ui,U 2 G a\) S 2 with |ni -tt 2 | < en, 1 < A: < L 2 ; 

(ii) K(t) < C\t\°^. 

Theorem 4. Assume that (A7) and the conditions of Theorem 3 hold, 
and that the joint density f is twice partially continuously differentiable. Let 
the kernel K he supported on [—1,1], symmetric and Lipschitz continuous. 
Let the handwidths hj satisfy n^^^hj —>■ Cj for some constants cj >0. Then, 
for fixed points uj £ Sj Ci Sj^c, it holds that n'^^^{fj{uj) — fj{uj))/fj{uj) are 
jointly asymptotically normal with mean < j < 3) and variance 

diag(fT|(nj): 1 < j < 3). Furthermore, {fj{uj) - fj{uj))/fj{uj) = 
Op{n~‘^^^^/logn) uniformly for uj G Sj. 

Remark 2. In the case where the third component function /s is con¬ 
stant, that is, there is no periodic component, the above theorem continue 
to hold for the component /i and /2 without those conditions that pertain 
to the set S 3 and the function /s. 

5. Numerical properties. 

5.1. Simulation studies. We considered two densities on X = {(x,y):0 < 
x,y < l,x + y < 1}. Model 1 has the components /i = /2 = 1 on [0,1], 
and / 3 (u) = ci(sin(27ru) -|- 3/2),rt G [0,1], where ci > 0 is chosen to make 
/(x,y) = /i(a:)/ 2 (y)/ 3 (mj(x -b y)) be a density on I. Model 2 has /i(u) = 
3/2 —tt, / 2 (u) = 5/4 —3tt^/4 and fsiu) = C 2 {u^ — /2 + u/2 + l/2) for some 

constant C 2 > 0. We took J = 2. We computed our estimates on a grid of 
bandwidth choice hi = / 12 . For model 1, we took {0.070 -|- 0.001 x y : 0 < j < 
30} in the range [0.070,0.100], and for model 2 we chose (0.40-1-0.02 x j :0< 
j < 20} in the range [0.40,0.80]. In both cases, the ranges covered the op¬ 
timal bandwidths. We obtained MISEy = E J^lljiu) — fj{u)]‘^du, ISBj = 
~ fjiu)?du and IVy = E f^[fj{u) - Efj{u)]^du, for 1 < j < 3, 
based on 100 pseudo samples. The sample sizes were n = 400 and 1000, but 
only the results for n = 400 are reported since the lessons are the same. 

Figure 2 is for model 1. It shows the boxplots of the values of MISEy, ISBy 
and IVj computed using the bandwidths on the grid specified above, and 
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MISE ISB IV 



f1 f2 f3 f1 K f3 f1 f2 f3 


Fig. 2. Boxplots for the values of MISE, ISB and IV of our estimates fj computed using 
various bandwidth choices (model 1, n = 400). 

thus gives some indication of how sensitive our estimators are to the choice 
of bandwidth. The bandwidth that gave the minimal value of MISEi + 
MISE 2 + MISE 3 was /ii = /i 2 = 0.089 in model 1, and hi = /i 2 = 0.64 in 
model 2, for the sample size n = 400. The values of MISEj along with ISBj 
and IVj for these optimal bandwidths are reported in Table 1. Although 
our primary concern is the estimation of the component functions, it is also 
of interest to see how good the produced two-dimensional density estimator 
+ y)) behaves. For this, we include in the table the val¬ 
ues of MISE, ISB and IV of the two-dimensional estimates computed using 
the optimal bandwidth hi = /i 2 = 0.089 in model 1, and hi = h 2 = 0.64 in 
model 2. For comparison, we also report the results for the two-dimensional 
local linear estimates defined at (3.5). For the local linear estimator, we 
used its optimal choices hi = h 2 = 0.085 in model 1, and hi = h 2 = 0.48 in 
model 2. We found that the initial local linear estimates had a large portion 
of mass outside I, and thus behaved very poorly if they were not re-scaled 
to be integrated to one on I. The reported values in Table 1 are for the 
adjusted local linear estimates. Overall, our two-dimensional estimator has 
better performance than the local linear estimator, especially in model 2. 
Figure 3 depicts the true density of model 1 and our two-dimensional esti¬ 
mate that has the median performance in terms of ISE. 

5.2. Data examples. The original data set we analyze in this section was 
collected between the year 1990 to 2011 by the major global UK based nonlife 
insurance company RSA. The dataset—and more details about it—is pub¬ 
licly available via the Cass Business School web site together with the paper 
“Double Chain Ladder” at the Cass knowledge site. The observations were 
the incurred counts of large claims aggregated by months. During the 264 
months, 1516 large claims were made. The dataset is provided in the form 
of a classical run-off triangle {N^i : 1 < k,l < 264, k + I < 265}, where N^i 
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Table 1 

Mean integrated squared errors (MISE), integrated squared biases (ISB) and integrated 

variance (IV) of the estimators 


Component functions Joint density 




fi 

H 

/3 

Our est. 

Local linear 

Model 1 

MISE 

0.0756 

0.0937 

0.1283 

0.2493 

0.2537 


ISB 

0.0528 

0.0752 

0.0963 

0.1844 

0.2199 


IV 

0.0228 

0.0184 

0.0320 

0.0649 

0.0338 

Model 2 

MISE 

0.0124 

0.0057 

0.0130 

0.0475 

0.0624 


ISB 

0.0120 

0.0054 

0.0127 

0.0469 

0.0607 


IV 

0.0004 

0.0003 

0.0003 

0.0006 

0.0017 


denotes the number of large claims incurred in the kth month and reported 
in the (A: ^ — l)th month, that is, with [I — 1) months delay. Since the data 

are grouped monthly, we need pre-smoothing of the data to apply the model 
(2.1) that is based on data recorded over a continuous time scale. A natural 
way of pre-smoothing is to perturb the data by uniform random variables. 
Thus, we converted each claim {k,l) on the two-dimensional discrete time 
scale {{k,l) :1 < k,l < 264, k + l < 265}, into {X, Y) on the two-dimensional 
continuous time scale I = {{x, y): 0 < x, y < 1, x + y < 1}, by 

^ _ k — 1 + Ul 

“ 264 ’ “ 264 ’ 

where {Ui,U 2 ) is a two-dimensional uniform random variate on the unit 
square [0,1]^. This gives a converted dataset {(Xj,!}):! <i< 1516}. We 
applied to this dataset our method of estimating the structured density / 
of (x,y). 



Fig. 3. The true density (left) and our estimated two-dimensional density function 
(right) computed from the pseudo sample that gives the median performance in terms 
of ISE, for model 1 and n = 400. 
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Since one month corresponds to an interval with length 1/264 on the [0,1] 
scale, one year is equivalent to an interval with length 12/264 = 1/22 on the 
latter scale. We let the periodic component / 3 (mj(-)) in the model (2.1) 
reflect a possible seasonal effect, so that we take one year in the real time to 
be the period of the function. This means that we let the periodic component 
f'i{mj{-)) have 1/22 as its period, and thus take J = 22. For the bandwidth, 
we took hi = h 2 = 0.01. The chosen bandwidth may be considered to be 
too small for the estimation of /i and / 2 . However, we took such a small 
bandwidth to detect possible seasonality. Note that the bandwidth size 0.01 
corresponds to 0.01 x 12 x 22 = 2.64 months. We found that even with this 
small bandwidth the estimated curve /s was nearly a constant function, 
which suggests that the large claim data do not have a seasonal effect. 

To see how well our method detects a possible seasonal effect in the data, 
we augmented the dataset by adding a certain level of seasonal effect as 
follows. We computed 


Ki = 2Nki 
Ki = 3Nki 
N'ki = 5Nu 
N'ki = 3Nu 

Ki = Nm 


if /c + / = 12m for some m = 1,2,..., 
if A: + / = 12m + 1 for some m = 1,2,..., 
if A; + Z = 12m + 2 for some m = 0,1,..., 
if A; + Z = 12m + 3 for some m = 0,1,..., 
otherwise. 


Since (Zc + Z — 1 modulo 12) is the actual month of the claims reported, 
the augmented dataset has added claims in November, December, January 
and February. The augmentation resulted in increasing the total number of 
claims to 2606 from 1516. The increased counts of reported claims were 252 
from 126 for November, 600 from 200 for December, 455 from 91 for January 
and 300 from 100 for February. 

In our estimation procedure, the bandwidths hi and /i 2 control the smooth¬ 
ness of the local linear estimate / along the x- and y-axis, respectively. Con¬ 
sequently, choosing small values for hi and Z 12 would result in nonsmooth 
estimates of the functions fi and / 2 , which we observed in the pilot study 
with Zii = Z 12 = 0.01. Nevertheless, in some cases setting these bandwidths 
to be small, relative to the scales of X and Y, might be preferred when one 
needs to detect possible seasonality, as is the case with the current dataset. 
In our dataset, the bandwidth size 1/264 = 0.0038 on the scale of [0,1] corre¬ 
sponds to one month in real time. Thus, taking the bandwidths to be 0.015, 
for example, that corresponds to a period of four months, forces the seasonal 
effect to almost vanish in the estimate of /s. 

To achieve both aims of producing smooth estimates of fi and / 2 , and 
of detecting possible seasonal effect, we applied to the augmented dataset 
a two-stage procedure that is based on our estimation method described 
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Fig. 4. Estimated curves fj for the model (2.1) obtained by applying the two-stage pro¬ 
cedure to the augmented large claim data. 


in Section 3. In the first stage, we got a local linear estimate / with h\ = 
/i 2 = 0.01, and found an estimate of /a using the iteration scheme at (3.4). 
In the second stage, we recomputed a local linear estimate / with larger 
bandwidths hi = /12 = 0.05, and found estimates of /i and /2 using only 
the first two updating equations at (3.4) with /|^ being replaced by the 
estimate of /s obtained in the first stage. 

The results of applying this two-stage procedure to the augmented dataset 
are presented in Figure 4. Clearly, the seasonal effect of the augmented 
dataset was well recovered in the estimate of /s, and at the same time 
smooth estimates of fi and /2 were produced. The augmented data set 
indicate an increased number of claims in the winter time. This is clearly 
reflected in the estimated results, where the Hrst part and the last part 
of the estimated effect is higher than the rest of the curve. Imagine the 
realistic situation that a nonlife insurer on the first day of November has to 
produce budget expenses for the rest of the year. The classical multiplicative 
methodology is not able to reflect the two month perspective of such a 
budget. Therefore, considerable work is being done manually in finance and 
actuarial departments of nonlife insurance companies to correct for such 
effects. With our new seasonal correction, costly manual procedures can be 
replaced by cost saving automatic ones eventually benefitting the prices all 
of us as end customers have to pay for insurance products. 

Figure 5 depicts the resulting two-dimensional joint density. Notice that 
this two-dimensional density is clearly nonmultiplicative. The seasonal cor¬ 
rection provides a visually deviation from the multiplicative shape. Also, 
note that while this two-dimensional density is nonmultiplicative, the nature 
of this deviation is not immediately clear to the eye. Whether the deviation 
is pure noise, a seasonal effect or some other effect is not easy to get from the 
full two-dimensional graph of the local linear density estimate which is also 
presented in Figure 5. For the local linear estimate, we used hi = h 2 = 0.03. 
We tried other bandwidth choices such as 0.01 and 0.05, but found that 
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Fig. 5. Local linear joint density estimate (left) and our estimate (right) for the model 
(2.1) obtained by applying the two-stage procedure to the augmented large claim data. 

the smaller one gave too rough estimate and the larger one produced too 
smooth a surface. Our two-dimensional density estimate therefore illustrates 
why research into structured densities on nontrivial supports is crucial to 
extract information beyond the classical and simple multiplicative one. 

APPENDIX 

A.l. Proof of Theorem 1. Suppose that ( 51 , 92 ; 53 ) is a tuple of func¬ 
tions that are bounded away from zero and infinity with f^gi(x)dx = 
fo 92 {v) dy = l and 

f{x,y) = gi{x)g 2 {y)g‘i{rnj{x + y)). 

Furthermore, we assume that <71 and 52 are differentiable on [0,1] and that 
(73 is twice differentiable on [0,1). For j S {1,2,3} define ftj = log fj — loggj. 
By assumption, we have 

gi{x) + ii 2 {y) + yi{mj{x + y)) = 0. 

For z G [0,1), we choose (x,y) in the interior of Z with mj{x + y) = z. 
Then we have that 

0 = -Q^[gi{x) + g 2 iy) + gsimjix + y))] = fi'^iz). 

Thus, /is is a linear function. Furthermore, we have that /i3(0) = 113 ( 1 —). 
This follows by noting that /i3(0) = —fii(x) — ft 2 (y) for (x,y) G Z with 
mj(x -L y) = 0. Note that mj(x -|- y) = 0 if and only 11 x -\-y = 1/J for some 
/ > 1, if (x,y) is in the interior of Z. After slightly decreasing x and y to 
x-l-6x and y + 6y with small 6x <0, dy < 0, we have that 113(1 + J(6x + 6y)) = 
-gi(x 5x) - ft 2 (y + dy) since mj(x y-f dx + dy) = 1-I- J(dx + dy). Thus, 
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/^3(0) = //3(1—) follows from continuity of and /i 2 - We conclude that 
must be a constant function. Thus, + fJ- 2 {y) is a constant function. 

From assumption (A5), we get that is constant on the intervals 

[xj,Xj+i\. Because the union of these intervals is equal to [0,1] we conclude 
that yLi{x) is constant on [0,1]. Using again (A5) we get that fJ- 2 {y) is con¬ 
stant on [0,1]. Because of the assumption that gi(x) dx = g 2 {y) dy = 1 

and /i(x) dx = f 2 {y) dy = 1 we get that /i = yi, /2 = g 2 and /s = ya- 

This completes the proof. 

A.2. Proof of Theorem 2. We first argue that //i, yL 2 and ya are a.e. 
equal to piecewise continuous functions on (0,1), with a hnite number of 
pieces. To see that is a.e. equal to a piecewise continuous function, we 
note that 

fii{x) = - [g 2 {y) +g 3 imjix + y))]dy/nies{l 2 {x)) a.e. a: G (0,1). 

Jh{x) 

Here, because of (A3) and (A6), the right-hand side is a piecewise continuous 
function. Thus, yi is a.e. equal to a piecewise continuous function. In abuse of 
notation, we now denote the piecewise continuous function by /ii. By similar 
arguments, one sees that ^ 2 -, and ya are piecewise continuous functions (or 
more precisely a.e. equal to piecewise continuous functions). This implies 
that 

(A.l) ^ii{x) +y2{y) +g3{mj{x + y))={) 

for {x,y,mj{x + y)) ^ {xi,...,x^J x (0,1)^ U (0,1) x {yi,...,y^J x (0,1) U 
(0,1)^ X {zi,... ,Zr^} for some values xi,..., yi,..., y^j, 2 : 1 , • ■ •, -Zrs G (0,1). 

We now argue that ys is continuous on [0,1). To see that ya is continuous 
at zq G [0,1), we choose (xq, yo) in the interior of I such that mj{xo + yo) = 
zq. This is possible because of assumption (A2). We can choose xq and yo 
such that yi is continuous at xq and y 2 is continuous at yo. Thus, we get 
from (A.l) that is continuous at zq. Similarly, one shows that yi and y 2 
are continuous functions on [0,1]. This gives that 

(A.2) yi{x) + y 2 {y) + y 3 {mj{x + y)) = 0 

for all X, y G (0,1). 

For Zq G [0,1), we choose (xq, yo) in the interior of I with mj(xo + yo) = 
Zq. Note that for 6x and 6y sufficiently small we get for zq G (0,1) that 
m-j(xo + da; + yo + dy) = Zq + J{5x + dy). This gives for 5x and 5y sufficiently 
small that 

yiixQ + 6x) + y2{yo + dy) + y3izo + J {dx + dy)) = 0 . 

With 5x, dy and 5y sufficiently small, we get that 

/^2(yo + dy) -|- ^ 3(^0 + J{dx + dy)) = y2{yo + dy) + ya(^0 + J{dx + dy)). 
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With the special choice 6x = —Sy, this gives 

f^2{yo + Sy) + ysizo) = y2{yo + S'y) + H3{zo + J{6y- 6y)). 

Let 7 be a function defined by 7 (tt) = ysizo + Ju) — ysizo). From the last 
two equations taking u = 5x + 6y and v = 6y — 6y, we get 

7(tt + u) = 7(u) + 7(u) 

for u,v sufficiently small. This implies that, with a constant Cz^ depend¬ 
ing on zq we have 7 (u) = c^qU for u sufficiently small; see Theorem 3 of 
Guillot, Khare and Rajaratnam (2013). Thus, we obtain fi 3 {z) = + bzoZ 

with constants a^p and bz^ depending on zq for z in a neighborhood Uzq 
of zq. Because every interval [z', z"] with 0 < z' < z" < 1 can be covered by 
the union of finitely many Uz's we get that for each such interval it holds 
that ysiz) = -L bz\z''Z for z G [z',z"] with constants az'^z” and bzi^z" 
depending on the chosen interval [z',z'']. 

One can repeat the above arguments for zq = 0. Then we have that 
mj{xo + 6x + yo + 6y) = 1 + J(4 + by) for 6x + by < 0 and mj(xo + 4 + 
yo + by) = J{bx + by) for 5x + by > Arguing as above with 5x + by > Q 
and by — by > get that ^^{z) = a+ -|- 6+z for z G (0,z'''] for z+ > 0 

small enough with some constants a+ and 6_|_. Similarly, we get by choos¬ 
ing 6x + by < 0 and by — by < 0 that /i 3 (z) = a_ -|- 6_z for z G (z“, 1) for 
z~ < 1 large enough with some constants a_ and 6_. Thus, we get that 
^ 3 (z) = a -L 6z for z G (0,1) with some constants a and b. 

Furthermore, using continuity of ni, fi 2 and the relation fj. 3 {mj{x + y)) = 
—fii{x) — y 2 {y) for z = mj{x + y) with z in (1 — 6, 1 ) and ( 0 , b) with 5 > 0 
small enough we get that /i 3 ( 0 ) = fi 3 {l—). Thus, we have 6 = 0 and we 
conclude that is a constant function. This gives 

yi{x)+y 2 {y) = -a 

for all {x,y) G X. Now arguing as in the proof of Theorem 1 we get that 
/i = 51! /2 = 92 and /s = 53 . This completes the proof. 

A.3. Proof of Theorem 3. Let 0'(0,g)(d,5) denote the derivative Q, 
defined in Section 4, at (0,g) to the direction (d,5). We note that we write 
Q'{Q, 0)(d, 5) simply as ^'(d, 5) in Section 4. We use the sup-norm ||(d, 5)||oo 
as a metric in the space x 5, defined by 

ll(d,<5)||^ = max| |(ii|,|(i 2 |,|d 3 l, sup|(5i(u)|, sup 162 ( 11 ) 1 , sup| 63 (u)||. 

^ U&S 2 ugSs 1 

Define G{0, g) = + 0,^o (1 -|-g)), where J- is defined in Section 4, and let 

^'(0,g) denote the derivative of Q at (0,g). In the setting where f{x,y) — 
f{x,y) = Op{en) uniformly for {x,y) Gl, we claim: 
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(i) sup||(d,5)||^=i ||e'(0,0)(d,d) -g'(0,0)(d,d)||oo = Op(en); 

(ii) The operator ^'(0,0) is invertible and has bounded inverse; 

(iii) The operator Q' is Lipschitz continuous with probability tending to 
one, that is, there exists constants r,C >0 such that, with probability tend¬ 
ing to one, 

sup ||g'(0i,gi)(d,d) -g'( 02 ,g 2 )(d,d)||^ < C||(0i,gi) - (02,g2)|loo 

ll(d,5)||oo=l 

for all (01,gi), (02;g2) £ ^r(0,0), where Br{0,g) is a ball with radius r > 0 
in X 5 centered at (^;g)- 


Theorem 3 basically follows from the above (i)-(iii). To prove the theorem 
using (i)-(iii), we note that claim (ii) with the definitions of 0 and f at (4.3) 
gives 0 — 1 = Op{en) and (f — f)/f = Op(en)- With (i) and (iii), this implies 
that 

(A.3) sup \\g\e - 1, (f - f)/f)(d, 5) - g'(0,0)(d,(5)|| = Op{en). 

ll(d,<5)||oc=l 


Now, from (ii) it follows that there exists a constant C > 0 such that the 
map Q'{0 — 1, (f — f)/f) is invertible and \\G'{0 — 1, (f — f)/f)“^(d, 5)||oo < 
C'||(d,5)||oo with probability tending to one. Also, (iii) is valid for all (0i,gi), 
(02;g2) £ B2r{0 — 1, (f — f)/f). Then we can argue that the solution of the 
equation 0(0,g) = 0, which is (0 — l,(f — f)/f), is within Con distance 
from (0 — 1, (f — f)/f), with probability tending to one, where C > 0 is a 
constant and = ||^(0 — 1, (f — f)/f)||oo- This follows from an application 
of the Newton-Kantorovich theorem; see Deimling (1985) or Yu, Park and 
Mammen (2008) for a statement of the theorem and related applications. 
To compute an, we note that 


(A.4) 


g{e - 1 , (f - f)/f) = ^( 0 , 0 ) + g'(o,o )(0 

= 0(0,0)+g'(O,O)(0 


l,{i-i)/{) + Op{el) 
l,{i-f)/{) + Op{el). 


For the first equation of (A.4), we have used (iii) and the facts that 0 — 1 = 
Op{en) and (f — f)/f = Op{Sn)- The second equation of (A.4) follows from 
the inequality 


||0'(O,O)(d,d) -0'(O,O)(d,d)||^ YC sup \fix,y) - f{x,y)\ • ||(d,<5)||^ 

x,y(^S 


for some constant C > 0. Now, 0(0,0) = .T(l, f) = (O"'", (£„, o/I)''')"''. From 
the definition (4.3), we also get 0'(O, O)(0 — 1, (f — f)/f) = (O"*", — 

This proves an = Op(en); so that ||(0 - 0, (f - f)/f)||oo = Op(£n). 

Claim (i) follows from the uniform convergence of / to / that is assumed 
in the theorem: snp(^^y-^^g\f{x,y) — f{x,y)\ =Op(e„). Below, we give the 
proofs of claims (ii) and (iii). 


24 


LEE, MAMMEN, NIELSEN AND PARK 


Proof of claim (ii). For this claim, we first prove that the map G'(0, 0) 
is one-to-one. Suppose that ^'(0,0)(d, 5) = 0 for some d = {di,d 2 ,ds)~^ and 
S = ( 51 , 52 , 53 )'''. Then, by integrating the fourth component of G\0, 0)(d, 5), 
we find that 

0 = / fix,y)[5i{x) + 62 {y)+S 3 {mj{x + y))]dxdy = di / f{x,y)dxdy, 

Js Js 

where the first equation holds since the right-hand side equals, up to sign 
change, the third component of G'iO, 0)(d, S). Similarly, we get (^2 = ^3 = 0. 
Now, from 6;'(o,o)(o,5) = o we have 

0= [ { 0 ~^ ,S{x,y,z)~^)G'{ 0 ,S){x,y,z)dxdydz 

JS1XS2XS3 

= - f{x,y)[Siix) + 52 iy)+ 53 {mj{x + y))f dxdy. 


This implies 

(A.5) 5i(x)-L 52(2/)-L 53(771 j(x-Ly)) = 0 


a.e. on S. 


Arguing as in the proof of Theorem 2 using the last three equations of 
^'( 0 , 0 )( 0 ,5) = 0, we obtain 6j = 0 on Sj, 1 < j < 3. 

Next, we prove that the map ^'(0,0) is onto. For a tuple ( 0 , 77 ) with 
c = (ci,C 2 ,C 3 )'^ and r]{x,y,z) = {r]i{x),r]2{y),'niz)V, suppose that (( 0 , 77 ), 
^'(0,0)(d, 5)) = 0 for all (d, 5) G x S. This implies 

0 = / fix,y)r]i{x)dxdy, 

Js 

0 = / f{x,y)r] 2 {y)dxdy, 

Js 

0 = / fix,y)r] 3 {mj{x + y))dxdy, 

Js 

0 = / fix,y)[r]i{x)+T] 2 iy)+mimjix + y))]dy 
J J2{x) 

(A.e) +Cifi{x) + C 3 U^i{x), 

0 = / f{x,y)[niix) + V2{y) + r]3imjix + y))]dx 
J-h{y) 

+ C2f2{y)+C3fw,2{y), 

L{J) 

0=^ / f{x,{z + l)/J-x)[rii{x)+r] 2 {{z + l)/J-x)+ri 3 {z)]dx 
1=0 

+ C3/^,3(^)- 
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From the first three equations of (A. 6 ), we get ci -|-i?C 3 = 0 by integrating the 
fourth equation. Similarly, we obtain C 2 + i 9 c 3 = 0 and C 3 = 0 by integrating 
the fifth and the sixth equations. This establishes ci = C 2 = C 3 = 0. Putting 
back these constant values to (A. 6 ), multiplying r]i{x),T] 2 {y) and ^(z) to 
the right-hand sides of the fourth, fifth and sixth equations, respectively, 
and then integrating them give 

/ f{x,y)[mix) + r]2iy) + V 3 {mj{x + y))f dxdy = O. 

Js 

Going through the arguments in the proof of ^^(0,0) being one-to-one and 
now using the first two equations of (A.6) give r/i = 7/2 = % = 0. Note 
that the first two equations can be written as fw^i{x)rii{x) dx = 0 and 
fw, 2 {y)il 2 {y) dy = 0, and thus in the latter proof f^j for j = 1,2 take the 
roles of fj in the former proof. The foregoing arguments show that (0,0) 
is the only tuple that is perpendicular to the range space of ^'(0,0), which 
implies that ^'(0,0) is onto. 

To verify that the inverse map ^'(0,0)“^ is bounded, it suffices to prove 
that the bijective linear operator ^'( 0 , 0 ) is bounded, owing to the bounded 
inverse theorem. Indeed, it holds that there exists a constant C > 0 such 
that ||^'(0,0)(d,<5)||oo < C'||(d,<5)||oo- This completes the proof of claim (ii). 
□ 

Proof of claim (hi). We first note that gi)(d,<5) — Q '{ 62 , 

g 2 )(d,d) = ^'(0i,gi)(d,<5)-^'(02,g2)(d,5). From this, we get that, for each 
given r > 0, 

||a'(0i,gi)(d,<5)-a'(02,g2)(d,<5)||^ < 6(1-br) max sup /«,,j(u)||g 2 -gi||oo 

^<3<^u&Sj 

for all (0i,gi), (02,g2) S .Br(0,0) and for all (d,<5) with ||(d,<5)||oo = 1- For 
this, we used the inequality 

sup |K(x,y,2;g2,<5) - «:(x,y,z; gi,<5)| 

(x,y,2)€Si XS 2 x53 

<3||5|| 

00 (2 + ||gi|| 00 T ||g2 ||oo) ||g2 gllloo- 

This completes the proof of (hi). □ 

A.4. Proof of Theorem 4. Let f^{x,y) be the hrst entry of f)^{x,y), 
where f]^ is defined as f) at (3.5) with b being replaced by b — Eh. Likewise, 
define f^{x, y) with b(x, y) being replaced by Eh{x, y) — {f{x, y),hi df{x, y)/ 
dx,h 2 df{x,y)/dyy. Then f{x,y) = f{x,y) + f^\x,y) + f^{x,y). Define 
and as fi at (4.2) with / — / being replaced by and , respectively, 
and fVf = iff/fi,fi/f 2 , fi/fs) along with O" - 1 = {Of - 1,- 1,- 1) 
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for s = A and B as the solution of the backfitting equation (4.4) with fi 
being replaced by /i®, subject to the constraints (4.5). Since the backfitting 
equation (4.4) is linear in /i, we get that f = f + f^ + f^ and 6 = 6^ — 1 + 6^. 

For simplicity, write the backhtting equation (4.4) as <5 = d + /x — T5 with 
an appropriate definition of the linear operator T. From the definitions of 
and 6^, we have f"^/f = 6^ — 1 + — T(f^/f). From Lemma 1 below, 

we obtain 

F/f - = 0^ - 1 - T(F/f - /i^) + Op(n-2/^) 

uniformly on Si x S'2 x S3. This implies f"^/f — fi^ = Op{n~‘^/^) uniformly 
on Si X S2 X S3 and 0"^ — 1 = 

Now, for the deterministic part f^, recall the dehnitions of and 
at (4.6) and thereafter, respectively. Let According to 

Lemma 1, r„ = o(n“^/^) on S( x S2 x S3, where S'- is a subset of Sj with 

the property that mes(Sj — S'j) = 0{n~^/^). We also get r„ = 0(n“^/^) on 
Si X S2 X S3. This implies T(r„) = o(n“^/®), so that 

f^/f - r„ = 0^ - 1 + - T(f^/f - r„) + Op(n“^/^) 

uniformly on Si x S2 x S3. Thus, (f^/f,0 — 1) equals the solution of the 
backfitting equation <5 = d + — Td, up to an additive term whose 

jth component has a magnitude of an order o(n“^/^) on S'- and 0(n“^/^) 
on the whole set Sj. 

The asymptotic distribution of {{fj{uj) — fj{uj))/fj{uj ): 1 < j < 3) for 
hxed Uj G Sj^c FI S° is then readily obtained from the above results. The 
asymptotic mean is given as the solution {5j{uj ): 1 < j < 3) of the backhtting 
equation (4.4) with fij being replaced by subject to the constraint 

(4.5). The asymptotic variances are derived from those of where 

= [ f^{x,y)dy, 

JJiix) 

P'tiy) = fw, 2 {y)~^ [ y) dx, 

L{J) . 

yi{z) = / f^{x,{z + l)/J -x)dx 

1=0 

and f^{x, y) = ELi [Kh, {X^ - x)Kh, (Ti - y)W^ -E{Kh, (W - x)Kh, (T* - 
y)Wi)]. This is due to (A.9), (A.10) and the corresponding property for 
in the proof of Lemma 2 below. 

To compute var(/ij^(ui)), we note that, due to the assumption (A7) and 
thus from Lemma 1, we may hnd constants C > 0 and a > 1/2 such that 
(u; Chi + ^2) C ^2(^1! ^2) for all u with |tt — ui| < /ii, if n is sufficiently 
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large. Note that J 2 {w, Chf + h 2 ) is inside J|(tt;/i 2 ) at a depth C/i". Then it 
can be shown that, for all {u, v) with \u — ui\ <hi and v € J^iw, Chf -|- / 12 ), 
the set {(f — y)/h 2 : 2 / G J 2 (rti)} covers the interval [—1,1], the support of the 
kernel K. This implies that Kh^{u — u{)v{ui,v) = Kh^{u — ui) for all {u,v) 
with |u — ui| <hi and u G J 2 (u; C/i" -|- / 12 ), where z/(ui,u) = Kh^iv — 

y) dy. Using this and the fact that the Lebesgue measure of the set difference 
J 2 {u) — J|(u; Chi + ^ 2 ) has a magnitude of order gg^ 

vav{fLf{ui)) 

= fw,i{ui)~^n~^hi^ -^K ty{ui,vff{u, v) du dv + 0{n~^) 

2 


:/u,,i(ui) ^hi ^ f 

J\i 


-K 


u — Ul 


\u-ui\<hi JJ^{u-,Ch^+h2) V 

X f{u,v) dvdu 


v{ui,v) 


-|-o(re 

= fw,i{'>^i)~'^n~^hi^ J f{u,v)dudv+ o{n~^h~^) 

= n~^hi^ J K‘^(u) du + o{n~^h~^). 

The last equation holds since ui G Si^c, so that is continuous at ui, and 
it is a fixed point in the interior of Si. Similarly, we obtain 

var(/i^(n 2 )) = n~^h 2 ^fw, 2 {u 2 )~^ J K‘^{u) du + o{n~^h~^). 

The calculation of the asymptotic variance of is more involved 

than those of var(/2^(uj)) for j = 1,2. For this, we observe that, if I 7 ^ I', 
then for any given z G [0,1] and {u, v) £l we have 

7Ti^l>{z,U,V,X,x') 

= Kh^ {u - x)Kh^ -b Kh^ {u - x')Kh^ -b x^ 

= 0 

for all X, x' except the case {z + V}/J — x = {z + 1')/J — x' , if n is sufficiently 
large. This implies that 

var(/ 2 ^(u 3 )) 

CJ) , 


= fw,3M 


1=0 J-hiCi) J-hiCa) JS 


TTi {u 3 ,u, V, X, x') f {u, v) du dv dx dx' 


+ 0{n 


- 1 ^ 
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where tti = From Lemma 1 again, we may find constants C > 0 and 
a > 1/2 such that J^ix^Ch'^ + /i 2 ) C J|(u;/i 2 ) for all x,u£ PI Si 

with |u — a;| < hi, 1 <k< Li. Define a subset J'^i{u 3 ) of [0,1] such that 
X G J^i{u3) if and only if a; G Jsiiu^ + J(/i 2 + Chf)t) for all t G [—1,1]. Then, 
for a given u G Si^c, it follows that 

[-1,1] C I-- :UG J2(u)j 


for all X G J^i{u 3 ) such that |x — tt| < hi and x lies in the same partition 
(Ofc-ijOfc) as u. This holds since x G J^iiz) implies {z + l)/J — x £ J 2 {x). 
This entails that, for x G n Si^{hi), 


/ Tri{u3,u,v,x,x') dudv 

Js 

= [ K{t)K{s)h^^K(t + ^^^^^']h 

V / 

= {K * K)h, (x - x') {K * K)h, {x - x'), 


Kis + 


X — X 


dt ds 


where K*K denotes the convolution of K defined by K*K{u) = f K{t)K{t + 
u)dt. Here and below, JJi) for a small number h > 0 denotes the set of 
X G Sj^c such that x + ht belongs to Sj^c for all t G [—1,1]. 

Because of the assumption (A7) and the fact that is a fixed point in 
‘53,0 we get that nies[J 3 ;(u 3 ) A J 3 ;(u 3 )] is of order o(l). This and the 

foregoing arguments give 


var(/i^(u 3 )) 


L{J) 

fw,3{u3) / 

l—n dJv(us) JJL 


1=0 dJaiM d J^i{u 3 )nSf^^{hi) Js 


TTl {U3, U, V, X, x') du dv 


^ f{ X, ^ — x] dx dx' 


+ o(n-^/®) 


L(J) 

fw,3{u3) ^ X] / / * K)hi{x - x'){K * K)h 2 ix - x') 

1=0 dJiii^a) dJ3i(u3) 


X f( X, - X ) dxdx' 
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Let J^i{u3]2hi) denote a subset of Jsiiu^) such that x G if and 

only if a: — 2hit G Jsiiu^) for all t G [—1,1]. Then 
LiJ) 

Y, / / {K * K)h,{x - x'){K * K)h,{x - x') 

1=0 J Juiui) 


X / ( 3:, ^ ~ ® ) dx' dx 


L{J) „ 

1=0 Jjii(u3\2hi) 


f ( X, - x ] dx 


X J [K*K{t)][K*K{hit/h2)]dt + 0{l) 


L(J) f 


1=0 ■^'^3i(“3) 
2 


/( x, -x]dx 


X J [K*K{t)][K*K{hit/h2)]dt + 0{l) 

/■2 

=J ^[K*K{t)][K*K{hit/h2)]dt + 0{l). 

This with Lemma 3 below completes the proof of Theorem 4. 


Lemma 1. Under the condition (A7) with the constants C > 0 and 
a > 1/2, it follows that (i) J 2 {ui :Chf + /i 2 ) C J 2 {u 2 ]h 2 ) for any ui,U 2 G 
(al_^,al) n Si with |ni — U 2 \ < hi, l<k< Li; (ii) Jf{ui-.Ch 2 + hi) C 
Ji{u 2 ]hi) for any ui,U 2 G (a|_j^,a|) fl S 2 with \ui — U 2 \<h 2 , l<k< L 2 . 

Proof. We apply (A7) to the choice = /ii. Suppose a point y G 
J 2 {ui]Chf + /i 2 )- This implies y + h 2 t + Chfs G J 2 {ui) for all s,t G [—1,1]. 
This holds since \{h 2 t + Chfs)/{h 2 + Chf)\ < 1 for all s,t G [—1,1]. By 
(A7), y + h 2 t G J^iui'jChf) C J 2 {u 2 ) for all t G [—1,1], so that we get y G 
J 2 {u 2 ',h 2 )- The proof of (ii) is the same. □ 

Lemma 2. Under the conditions of Theorem f, It follows that Tp,^ = 
Op{n~^/^) uniformly on SiX S 2 X S^. Furthermore, p^ + o{n~‘^^^) 

uniformly on Sf^hi) x ^(^ 2 ) x Sg for a sufficiently large 

C > 0, and p^{u) = n~‘^/'^p^{u) -|- 0(n“^/^) uniformly on 5i x 52 x So- 


Proof. From the standard theory of kernel smoothing, it follows that 
(A.7) sup |/^(T,y)| =Op(n"^/^°Vlogn). 

{x,y)&S 
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Also, we have A{x,y) = diag(l, *^ 2 ) for all {x,y) with x G S°^{hi) and 
y G J 2 (x; Ch" + /12), where C > 0 and a > 1/2 are the constants in assump¬ 
tion (A7) and 1^2 = f v?K{u)du. Define J = {{x,y) € S:x & S/^^{hi),y G 
J2{x;Chi + /i 2 )}- From the simplification of A{x,y) on we get 

(A.8) f^{x,y) = f^{x,y), {x,y)£j. 


(A.9) 


From (A.7) and (A.8), we have 

ytix) = fifix) + Op(n-(3+2D/io0^) 

uniformly for x G S°^{hi), 
where r = min{l, a}. Note that r > 1/2. Similarly, we get 
A2 ( 2 /) = ~4iy) + Op(n-(3+2O/io0^) 

(A.IO) 


uniformly for y G ^(^ 2 )- 

For the treatment of we first note that A(x, {z + l)/J — x) = diag(l, 1 ^ 2 , ^^ 2 ) 
for all X G n where the set J^i{z) is defined in the proof of 

Theorem 4. In fact, 


(A.11) {x,{z + 1)/J — x) £ J' if and only if x £ J^i{z) (1 
This implies that, for all 0 < Z < L(J), 

(A. 12) - x^ = - x^, x£j^i{z)r]S/,,{hi). 

Due to the condition (A7) we can take a constant C' > 0 such that, uniformly 
for 2 G Sg (,(C"n“^/®), we have nies[J 3 z( 2 ;)AJ 3 ;( 2 ;)] = 0(n“^/®). Then, 

from (A.7) and (A.12) we have 



f^{x,{z + l)/J 


x) dx 


L{J) . 

= / f^(x,(z + l)/J — x)dx 


L(J) 

-L Op(n“^/^°v^logn) ines[J3i{z)A{J^i{z) n Sl^{hi))] 
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uniformly for 2: G S'^ ^{Cn~'^. This implies = / 2 ^(z) -|-Op(n“^/®) uni¬ 
formly for 2 G S'g This together with (A. 9 ), (A. 10 ) and Lemma 3 

gives Tfi^ = Op{n~'^/^) uniformly on Si x S2 x S3, since = Op{n~‘^/^) 
uniformly on the set and the Lebesgue measures of the set differences Si — 
Si phi) and S2 — S2^Ph2) are of order and that of S3 — S^pC'n~'^^p 

is of order . 

To prove the second part of the lemma, recall that A{x,y) = diag(l, 1^2,1^2) 
on J'. In fact, for {x,y) G 

^hAu-x)KhPv-y)dudv = Q 

whenever j or k is an odd integer. This implies f^{x,y) = {x,y) + 

o(n“^/^) uniformly for {x, y) G J . We also get f^{x, y) = uniformly 

for (x, y) G S. We apply the same arguments as in the proof of the first part, 
to obtain 


/lf(x)=n-2/^/lf(x) + o(n-2/5) 

h2 {y) = (y) + 

From (A.ll), it follows that 


uniformly for x G 5 ° 
uniformly for ?/ G S2 c(^2)- 



for all {x,z) such that x G J^pz) n S°^(/ii) and 2; G S3. From this and the 

fact that mes[J3i(2) A J3j(z)] = o(l) uniformly for 2 G S^pC'n~^/p, 

we obtain 

/if (2) = n“^/^/if (2) -|- uniformly for 2 G Sf 

where C' is the constant C in the proof of the hrst part. This completes the 
proof of the lemma. □ 


Lemma 3 . Under the conditions of Theorem 4 , it follows that 
sup I/if (tt)| =Op(n"^/^v^logn), 1 < j < 3 . 

ueSj 


Proof. We give the proof for /if only. The others are similar. For (x,y) 
with X G Si and y G J|(x; Chf + /12), we have 

f^{x,y) = Ti{x)di{x,y) + ip2{x)a2{x,y) + ip3{x)a3{x,y), 
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where ipj for j = 1,2,3 are some bounded functions, ai = boo, = ^lo and 
as = 6oi with 


bjk{x,y) = n 


2 = 1 


Xi - xV/Yi - y 


hi 


ho 




x)Kh,{Yi-yW^ 


-E 


Xi-x 

hi 


j 



k 

KhA^i 


x)KhAY^-y)Wi 


The lemma follows from (A.7) and using 
sup mes[J 2 (a;) — J^ix] Ch^ + / 12 )] = 

xdSi 


sup 

xGSi 



Op{n ^/^i/logn). 


l<j<3. 


□ 
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